742 research outputs found

    MPEG-4 tools and applications: an overview

    Get PDF
    In this paper we present an overview of the software tools currently available for the creation and display of MPEG-4 content. We first describe tools for encoding raw video into MPEG-4 compliant bitstreams. We then describe how this content may be used to create a complete MPEG-4 scene containing both graphical and interactive elements in addition to the more usual video and audio elements. Clearly, MPEG-4 content cannot be viewed without appropriate decoders and players and these are addressed in the third section of this paper. Finally, we demonstrate how these tools may be combined together to create MPEG-4 applications by presenting the details of two sample applications we have developed

    Region-based segmentation of images using syntactic visual features

    Get PDF
    This paper presents a robust and efficient method for segmentation of images into large regions that reflect the real world objects present in the scene. We propose an extension to the well known Recursive Shortest Spanning Tree (RSST) algorithm based on a new color model and so-called syntactic features [1]. We introduce practical solutions, integrated within the RSST framework, to structure analysis based on the shape and spatial configuration of image regions. We demonstrate that syntactic features provide a reliable basis for region merging criteria which prevent formation of regions spanning more than one semantic object, thereby significantly improving the perceptual quality of the output segmentation. Experiments indicate that the proposed features are generic in nature and allow satisfactory segmentation of real world images from various sources without adjustment to algorithm parameters

    Dialogue scene detection in movies using low and mid-level visual features

    Get PDF
    This paper describes an approach for detecting dialogue scenes in movies. The approach uses automatically extracted low- and mid-level visual features that characterise the visual content of individual shots, and which are then combined using a state transition machine that models the shot-level temporal characteristics of the scene under investigation. The choice of visual features used is motivated by a consideration of formal film syntax. The system is designed so that the analysis may be applied in order to detect different types of scenes, although in this paper we focus on dialogue sequences as these are the most prevalent scenes in the movies considered to date

    A multi-modal event detection system for river and coastal marine monitoring applications

    Get PDF
    Abstract—This work is investigating the use of a multi-modal sensor network where visual sensors such as cameras and satellite imagers, along with context information can be used to complement and enhance the usefulness of a traditional in-situ sensor network in measuring and tracking some feature of a river or coastal location. This paper focuses on our work in relation to the use of an off the shelf camera as part of a multi-modal sensor network for monitoring a river environment. It outlines our results in relation to the estimation of water level using a visual sensor. It also outlines the benefits of a multi-modal sensor network for marine environmental monitoring and how this can lead to a smarter, more efficient sensing network

    Complexity adaptation in H.264/AVC video coder for static cameras

    Get PDF
    H.264/AVC uses variable block size motion estimation (VBSME) to improve coding gain. However, its complexity is significant and fixed regardless of the required quality or of the scene characteristics. In this paper, we propose an adaptive complexity algorithm based on using the Walsh Hadamard Transform (WHT). VBS automatic partition and skip mode detection algorithms also are proposed. Experimental results show that 70% - 5% of the computation of H.264/AVC is required to achieve the same PSNR

    Using dempster-shafer theory to fuse multiple information sources in region-based segmentation

    Get PDF
    This paper presents a new method for segmentation of images into large regions that reflect the real world objects present in a scene. It explores the feasibility of utilizing spatial configuration of regions and their geometric properties (the so-called Syntactic Visual Features [1]) for improving the correspondence of segmentation results produced by the well-known Recursive Shortest Spanning Tree (RSST) algorithm [2] to semantic objects present in the scene. The main contribution of this paper is a novel framework for integration of evidence from multiple sources with the region merging process based on the Dempster-Shafer (DS) theory [3] that allows integration of sources providing evidence with different accuracy and reliability. Extensive experiments indicate that the proposed solution limits formation of regions spanning more than one semantic object

    Fast intra prediction in the transform domain

    Get PDF
    In this paper, we present a fast intra prediction method based on separating the transformed coefficients. The prediction block can be obtained from the transformed and quantized neighboring block generating minimum distortion for each DC and AC coefficients independently. Two prediction methods are proposed, one is full block search prediction (FBSP) and the other is edge based distance prediction (EBDP), that find the best matched transformed coefficients on additional neighboring blocks. Experimental results show that the use of transform coefficients greatly enhances the efficiency of intra prediction whilst keeping complexity low compared to H.264/AVC

    Low computational complexity variable block size (VBS) partitioning for motion estimation using the Walsh Hadamard transform (WHT)

    Get PDF
    Variable Block Size (VBS) based motion estimation has been adapted in state of the art video coding, such as H.264/AVC, VC-1. However, a low complexity H.264/AVC encoder cannot take advantage of VBS due to its power consumption requirements. In this paper, we present a VBS partition algorithm based on a binary motion edge map without either initial motion estimation or Rate-Distortion (R-D) optimization for selecting modes. The proposed algorithm uses the Walsh Hadamard Transform (WHT) to create a binary edge map, which provides a computational complexity cost effectiveness compared to other light segmentation methods typically used to detect the required region

    Automatic detection and extraction of artificial text in video

    Get PDF
    A significant challenge in large multimedia databases is the provision of efficient means for semantic indexing and retrieval of visual information. Artificial text in video is normally generated in order to supplement or summarise the visual content and thus is an important carrier of information that is highly relevant to the content of the video. As such, it is a potential ready-to-use source of semantic information. In this paper we present an algorithm for detection and localisation of artificial text in video using a horizontal difference magnitude measure and morphological processing. The result of character segmentation, based on a modified version of the Wolf-Jolion algorithm [1][2] is enhanced using smoothing and multiple binarisation. The output text is input to an “off-the-shelf” noncommercial OCR. Detection, localisation and recognition results for a 20min long MPEG-1 encoded television programme are presented

    Scalable virtual viewpoint image synthesis for multiple camera environments

    Get PDF
    One of the main aims of emerging audio-visual (AV) applications is to provide interactive navigation within a captured event or scene. This paper presents a view synthesis algorithm that provides a scalable and flexible approach to virtual viewpoint synthesis in multiple camera environments. The multi-view synthesis (MVS) process consists of four different phases that are described in detail: surface identification, surface selection, surface boundary blending and surface reconstruction. MVS view synthesis identifies and selects only the best quality surface areas from the set of available reference images, thereby reducing perceptual errors in virtual view reconstruction. The approach is camera setup independent and scalable as virtual views can be created given 1 to N of the available video inputs. Thus, MVS provides interactive AV applications with a means to handle scenarios where camera inputs increase or decrease over time
    corecore